class: center, title-slide .title[ #
Preliminary Processing of Movement Data
] .subtitle[ ##
Brazil Move 2024
] .author[ ### Elie Gurarie and Nicki Barbour ] --- ```r xaringan::inf_mr() ``` # Movement Data .pull-left.large[ At a minimum contain three columns: - `X`, `Y`, `Time` But - unless there is only one of your animal in the world - also: - `ID` Simple! ] -- .pull-right.large[ **But**, in practice, there are always problems & complications. Even with just three columns: - `Time` - crazy complicated data formatting - `X` and `Y` - have to be **geo-referenced** ] -- .center.large[ .red[Data always needs to be **processed** (cleaned up). Nobody enjoys this! It is **tedious** and **time-consuming**. But also very important.] ] --- ## Principles of data processing ### Smart data processing is: - **compartamentalized** - e.g. each step a function - **interactive** - use visualization and interactive tools - **generalizable** - to apply to multiple individuals / multiple populations - **replicable** - *important: NEVER overwrite the raw data!* - **well-documented** - so you don't have to remember what you did - **forgettable!** - so once its done you don't need to think about it any more --- ## Some Tools Several packages and `Rtools` particularly useful for data processing and clean-up: - `plyr` - manipulating data frames and lists - functions: `mutate()`; `ddply()`-`ldply()`-`dlply()` - `lubridate` - manipulating time objects - `sf` - projecting coordinates - `maps` and `mapdata` - quick and easy maps - `magrittr` (or - now - the **native pipe**). --- .pull-left[ ## Example: Mountain Tapirs  *Tapirus pinchaque* - Anta da montanha - Tapir andino - Sacha huagra ] .pull-right[  .pull-right-40[  .blue[gracias Diego!] ]] --- ## Loadinc csv's is easy! ```r tapir1 <- read.csv("tapir1.csv") ``` ``` ## Date.Time.TTF..ID.North.East.Zone.DOP ## 1 2006 11 03\t01:00:00\t80\t\t0\t446699.556054\t522173.062831\t19N\t3.7 ## 2 2006 11 03\t02:00:00\t77\t\t0\t446632.144611\t522179.191144\t19N\t3.4 ## 3 2006 11 03\t03:00:00\t66\t\t0\t446613.759672\t522314.01403\t19N\t1.8 ## 4 2006 11 03\t04:01:00\t53\t\t0\t446619.887985\t522185.319457\t19N\t7.3 ## 5 2006 11 03\t05:00:00\t72\t\t0\t446632.144611\t522258.859213\t19N\t1.4 ## 6 2006 11 03\t06:01:00\t80\t\t0\t446619.887985\t522185.319457\t19N\t1.4 ``` **what is this!?** -- Try tapir 2: ```r tapir2 <- read.table("tapir2.csv") ``` ``` ## V1 V2 V3 V4 V5 V6 V7 V8 ## 1 Date Time TTF ID North East Zone DOP ## 2 12/12/2006 1:01:00 80 0 447483.980116 521210.917692 19N 1.4 ## 3 12/12/2006 3:01:00 77 0 447808.780705 521180.276127 19N 1.4 ## 4 12/12/2006 7:01:00 66 0 447643.316254 521033.196616 19N 1.4 ## 5 12/12/2006 8:00:00 53 0 448066.16985 521082.22312 19N 1.4 ## 6 12/12/2006 9:01:00 72 0 447821.037331 521088.351433 19N 1.9 ``` **what is going on here!?!?** --- .pull-left-70[ ## Complete processing of raw tapir data 1. Fixes formatting 2. Fixes times 3. Combines 4. Gives spatial projection 5. Saves ```r require(lubridate); require(plyr); require(sf); require(mapview) tapir <- rbind( read.table("tapir1.csv", sep = "\t", header = TRUE) |> mutate(ID = "Tapir1", Date = ymd(Date)), read.table("tapir2.csv", sep = "\t", header = TRUE) |> mutate(ID = "Tapir2", Date = dmy(Date)), read.table("tapir3.csv", sep = "\t", header = TRUE) |> mutate(ID = "Tapir3", Date = dmy(Date))) |> mutate(Time = ymd_hms(paste(Date, Time))) |> st_as_sf(coords = c("East","North"), crs = 32619) save(tapir, file = "tapir.rda") ``` ] -- .pull-right-30[  **Remember: Every data processing task is as unique and special as an (evil) snowflake!** ] --- ## Where are these tapirs? ```r load("tapir.rda") mapview(tapir, zcol= "ID") ```
] --- ## Movebank format helps a lot! ```r require(move) login <- movebankLogin(username="MyUserName", password="XXXXXX") tapir.move <- getMovebankData(study="Mountain tapir, Colombia", login=login) ``` .pull-right-70[] --- ## But even then, you need to to some tidying! .pull-left[ You probably want a format that doesn't have all of this: ```r names(tapir.move) ``` ``` ## [1] "tag_id" "sensor_type_id" "gps_dop" ## [4] "gps_time_to_fix" "height_above_msl" "location_lat" ## [7] "location_long" "timestamp" "update_ts" ## [10] "visible" "deployment_id" "event_id" ## [13] "sensor_type" "tag_local_identifier" ``` So we process to: 1. extract the columns we want 2. rename them properly 3. project to the same geometry as the first data 4. combine together ] -- .pull-right[ ```r tapir2 <- tapir.move[,c("tag_id", "timestamp", "location_long", "location_lat")] |> plyr::rename(c(timestamp = "Time")) |> st_as_sf() |> mutate(ID = paste0("Tapir", as.integer(factor(tag_id)) + 3)) |> st_transform(st_crs(tapir)) tapir_all <- rbind.fill(tapir, tapir2) |> st_as_sf() ``` ] --- ### Where were these tapirs? ```r mapview(tapir_all, zcol = "ID") ```
--- ## Now we practice ... .pull-left[ ### Elk (*Cervus elaphus*)  ] .pull-right[ near Banff National Park  ]